home *** CD-ROM | disk | FTP | other *** search
-
-
-
- - 1 -
-
-
-
- 3. _C_h_a_n_g_e_s__a_n_d__A_d_d_i_t_i_o_n_s
-
- The major additions and changes for the basic services and
- tools of the Performance Co-Pilot are described in the
- following sections.
-
- Refer to the reference pages of the individual utilities for
- a complete description of any new functionality.
-
- 3.1 _I_n_f_r_a_s_t_r_u_c_t_u_r_e__C_h_a_n_g_e_s
-
- The following changes have been made to the PCP
- infrastructure that affect both collector and monitor
- configurations.
-
-
- 3.1.1 _C_h_a_n_g_e_s__f_o_r__I_R_I_X__6_._5
-
- The following incidents were resolved for IRIX 6.5.13.
-
- 817880 The rrrreeeemmmmoooovvvveeee command to ppppmmmmaaaaffffmmmm(1) was not listing all
- of the files used by ppppmmmmllllooooggggggggeeeerrrr(1) for PCP archive
- folios created with the ``record'' facility of the
- GUI tools.
-
- 803341 The default ``replay'' tool for PCP archive folios
- created by mmmmkkkkaaaaffff directly was changed from mmmmkkkkaaaaffff to
- ppppmmmmcccchhhhaaaarrrrtttt. This makes the ppppmmmmaaaaffffmmmm rrrreeeeppppllllaaaayyyy function more
- useful. The mmmmkkkkaaaaffff(1) man page was updated to be more
- precise about the interactions between mmmmkkkkaaaaffff and
- ppppmmmmaaaaffffmmmm.
-
- The following incidents were resolved for IRIX 6.5.10.
-
- 794379 The routine responsible for parsing the PCP metrics
- namespace (ppppmmmmLLLLooooaaaaddddNNNNaaaammmmeeeeSSSSppppaaaacccceeee(3)) incorrectly accepts
- hyphens in metric names.
-
- The following incidents were resolved for IRIX 6.5.8.
-
- 768814 Resolved some diskless install problems.
-
- 773035 The xxxxvvvvmmmm PMDA exports mirror revive state
- information.
-
- The following incidents were resolved for IRIX 6.5.6.
-
- 764463 A new xxxxvvvvmmmm PMDA was added to export performance
- statistics from the _X_V_M volume manager.
-
-
-
-
-
-
-
-
-
-
-
-
-
- - 2 -
-
-
-
- 3.1.2 _P_C_P__2_._1__t_o__P_C_P__2_._2
-
- 1. PCP 2.2 for both IRIX and Linux is now built from the
- one source code base. While the list of features and
- packaging may be different between the distributions,
- you may see some evidence of minor changes that are a
- result of unifying the product development process for
- both platforms.
-
- 2. A standard set of environment variables are defined in
- /_e_t_c/_p_c_p._c_o_n_f and described in ppppccccpppp....ccccoooonnnnffff(4). These
- variables are generally used to specify the location
- of various PCP pieces in the file system and may be
- loaded into shell scripts by sourcing the /_e_t_c/_p_c_p._e_n_v
- shell script (see ppppccccpppp....eeeennnnvvvv(4)) and queried by C/C++
- programs using the ________ppppmmmmGGGGeeeettttCCCCoooonnnnffffiiiigggg(3) library function.
- See the PPPPCCCCPPPPIIIInnnnttttrrrroooo(1) man page for further details.
-
- 3.1.3 _P_C_P__2_._0__t_o__P_C_P__2_._1
-
- 1. To help with PCP deployments on systems running
- operating systems other than IRIX, the Performance
- Metrics Name Space (PMNS) has been overhauled to
- remove the iiiirrrriiiixxxx.... prefix from the names of the
- system-centric performance metrics, e.g.
- iiiirrrriiiixxxx....ddddiiiisssskkkk....ddddeeeevvvv....rrrreeeeaaaadddd____bbbbyyyytttteeeessss has become
- ddddiiiisssskkkk....ddddeeeevvvv....rrrreeeeaaaadddd____bbbbyyyytttteeeessss. In addition to changing the
- PMNS, translations are also handled dynamically in the
- PCP libraries, so all clients will continue to operate
- correctly using either the new or the old names. As a
- consequence no configuration files will need to be
- changed, and monitoring tools will work correctly in
- environments with a mixture of new and old style PMNS
- deployments.
-
- 2. The PCP inference engine ppppmmmmiiiieeee(1) has migrated from the
- _p_c_p._s_w._m_o_n_i_t_o_r subsystem to the _p_c_p__e_o_e._s_w._e_o_e
- subsystem, and the licensing restrictions have been
- relaxed to allow ppppmmmmiiiieeee to be used to monitor
- performance on the local host without any PCP
- licenses.
-
- 3. Support for running ppppmmmmiiiieeee(1) as a daemon has been
- added. This has many similarities to the ppppmmmmccccdddd(1) and
- ppppmmmmllllooooggggggggeeeerrrr(1) daemon support - ppppmmmmiiiieeee can be controlled
- through the cccchhhhkkkkccccoooonnnnffffiiiigggg(1) interface, and the startup
- and shutdown script, ////eeeettttcccc////iiiinnnniiiitttt....dddd////ppppmmmmiiiieeee which supports
- starting and stopping multiple ppppmmmmiiiieeee instances
- monitoring one or more hosts. This is achieved with
- the assistance of another script, ppppmmmmiiiieeee____cccchhhheeeecccckkkk(1) which
- is similar to the ppppmmmmllllooooggggggggeeeerrrr support script
-
-
-
-
-
-
-
-
-
-
-
- - 3 -
-
-
-
- ppppmmmmllllooooggggggggeeeerrrr____cccchhhheeeecccckkkk(1).
-
- 4. New capabilities have been added to assist in the
- estimation of PCP archive sizes. The ----rrrr option for
- ppppmmmmllllooooggggggggeeeerrrr(1) causes the size of the physical record(s)
- for each group of metrics and the expected
- contribution of the group to the size of the PCP
- archive for one full day of collection to be reported
- in the log file. The ----ssss option to ppppmmmmdddduuuummmmpppplllloooogggg(1) will
- report the size in bytes of each physical record in
- the archive.
-
- 5. Changes to ppppmmmmllllooooggggggggeeeerrrr(1) have greatly reduced the size
- of the *._m_e_t_a files created when logging metrics with
- instance domains that change over time.
-
- 6. As an aid to creating ppppmmmmllllooooggggggggeeeerrrr configuration files,
- ppppmmmmllllooooggggccccoooonnnnffff(1) is a new tool that allows selection of
- groups of commonly desired metrics and customization
- of ppppmmmmllllooooggggggggggggeeeerrrr configurations from a simple interactive
- dialog.
-
- 3.2 _C_o_l_l_e_c_t_o_r__C_h_a_n_g_e_s
-
- The following changes effect PMCD and the PMDAs that provide
- the collection services.
-
-
- 3.2.1 _L_i_b_i_r_i_x_p_m_d_a__C_h_a_n_g_e_s__f_o_r__I_R_I_X__6_._5
-
- The following incidents were resolved for IRIX 6.5.13.
-
- 616514 Export some bufview reported metrics.
-
- 815636 Fix network.interface.baudrate scale to match units.
-
- 822285 Export CXFS metrics if SGI_IS_OS_CELLULAR.
-
- 825330 Export event counter metrics for R12K & R14K.
-
- 826783 Export stats for meta and repeater routers on SGI
- Origin 3000 Series systems.
-
- The following incidents were resolved for IRIX 6.5.12.
-
- 814585 Export vnode freelist metrics.
-
- The following incidents were resolved for IRIX 6.5.11.
-
- 789419 Export tpsc metrics.
-
-
-
-
-
-
-
-
-
-
-
-
- - 4 -
-
-
-
- 807502 Export gfxinfo metrics.
-
- 807799 Fix handling of 1394 disk names.
-
- The following incidents were resolved for IRIX 6.5.10.
-
- 794983 Fix hinv.cputype for RM5271 and RM7000 cpus.
-
- 785163 Export metric for kernel memory per node.
-
- The following incidents were resolved for IRIX 6.5.9.
-
- 790121 Provide support for SN1.
-
- No significant _l_i_b_i_r_i_x_p_m_d_a changes were made for IRIX 6.5.7
- or 6.5.8.
-
- The following incidents were resolved for IRIX 6.5.6.
-
- 764170 Provide support for Fiber Channel disks.
-
- The following incidents were resolved for IRIX 6.5.5.
-
- 649767 Export metrics for streams data, which are also
- exported by nnnneeeettttssssttttaaaatttt ----mmmm.
-
- 682896 The semantics of the metrics of
- xxxxbbbboooowwww....{ppppoooorrrrtttt|ttttoooottttaaaallll}....{ssssrrrrcccc|ddddsssstttt} have changed from
- reporting transfer of bytes to transfer of
- micropackets as it is impossible to tell how many
- bytes of data are really transferred.
-
- The following incidents were resolved for IRIX 6.5.4.
-
- 675673 Export some additional xfs inode cluster metrics.
-
- The following incidents were resolved for IRIX 6.5.3.
-
- 628012 Export wait I/O metrics.
-
- The following incidents were resolved for IRIX 6.5.2.
-
- 558773 Export metrics for the instantaneous disk queue
- length and for the running sum of the disk queue
- lengths.
-
- The following incidents were resolved for IRIX 6.5.1.
-
- 588158 A section called "Enabling of Statistics Collection"
- has been added to the lllliiiibbbbiiiirrrriiiixxxxppppmmmmddddaaaa(5) man page.
-
-
-
-
-
-
-
-
-
-
-
-
- - 5 -
-
-
-
- 603178 Extra diagnostic messages were added to log the
- state changes of turning the xlv statistics
- gathering on or off.
-
- 3.2.2 _O_t_h_e_r__C_h_a_n_g_e_s__f_o_r__I_R_I_X__6_._5
-
- The following incidents were resolved for IRIX 6.5.13.
-
- 814533 Support added for instrumentation of activity from
- the kernel's cluster infrastructure heartbeat
- services, as used by CXFS and FailSafe.
-
- 820896 A new mmmmmmmmvvvv PMDA was added to support light-weight
- export of performance data from system daemons.
-
- 822509 Export activity statistics from the cluster
- infrastructure ffffssss2222dddd(1) daemon using the mmmmmmmmvvvv PMDA.
-
- 823395 Export activity statistics from the cluster
- infrastructure daemons ccccllllccccoooonnnnffffdddd(1), ccccrrrrssssdddd(1), ccccaaaadddd(1)
- and the ccccaaaadddd plugins.
-
- The following incidents were resolved for IRIX 6.5.12.
-
- 813494 A logic error in handling error returns from some
- system calls caused the xxxxvvvvmmmm PMDA to fail to
- correctly enumerate the instance domain of XVM
- volume elements and physical volumes.
-
- The following incidents were resolved for IRIX 6.5.11.
-
- 801248 Due to overflow in intermediate results, some of the
- memory metrics from the pppprrrroooocccc PMDA were susceptible
- to premature overflow.
-
- The following incidents were resolved for IRIX 6.5.10.
-
- 782226 Add ``job id'' to pppprrrroooocccc PMDA.
-
-
- 3.2.3 _P_C_P__2_._1__t_o__P_C_P__2_._2
-
- 1. The interface provided by _l_i_b_p_c_p__p_m_d_a between PMCD and
- a PMDA has been extended to support a new protocol
- (PPPPMMMMDDDDAAAA____IIIINNNNTTTTEEEERRRRFFFFAAAACCCCEEEE____3333), with the following semantics for
- the return codes from ppppmmmmddddaaaaFFFFeeeettttcccchhhh(3) callbacks:
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- - 6 -
-
-
-
- _______________________________________________________________________________
- Interface Return Value Meaning
- _______________________________________________________________________________
- PMDA_INTERFACE_1
- or
- PMDA_INTERFACE_2
- Value is an error code (e.g. PM_ERR_PMID,
- PM_ERR_INST or PM_ERR_AGAIN)
- < 0
-
-
- >= 0 Success
- _______________________________________________________________________________
- Value is an error code (e.g. PM_ERR_PMID,
- PM_ERR_INST or PM_ERR_AGAIN)
- PMDA_INTERFACE_3 < 0
-
- 0 The metric value is not currently available
- > 0 Success
- _______________________________________________________________________________
- |||||||||||
-
-
-
-
-
-
-
-
- |||||||||||
-
-
-
-
-
-
-
-
- |||||||||||
-
-
-
-
-
-
-
-
- |||||||||||
-
-
-
-
-
-
-
-
-
-
- These changes allow more detail to be passed back from
- the PMDA to the clients via PMCD in the cases where
- metric values are legitimately not currently available
- (as opposed to some error condition preventing the
- metric value from being fetched).
-
- 3.2.4 _P_C_P__2_._0__t_o__P_C_P__2_._1
-
- 1. The pppprrrroooocccc agent has been changed to use
- /_p_r_o_c/_p_i_n_f_o/_x_x_x_x if possible and only use /_p_r_o_c/_x_x_x_x
- if there is no alternative. Previously this agent
- always used /_p_r_o_c/_x_x_x_x to extract process information,
- and this caused unnecessary access checking to take
- place and some NFS contention problems were reported
- as a result.
-
- 2. The new eeeessssppppppppiiiinnnngggg PMDA provides quality of service
- metrics for consumption by the Embedded Support
- Partner (ESP) infrastructure (released in IRIX 6.5.5).
- This PMDA can be used in conjunction with ppppmmmmiiiieeee(1)
- rules generated by the new ppppmmmmiiiieeeeccccoooonnnnffff(1) tool to detect
- service failure on either local or remote hosts.
- Among the services which can be probed are ICMP, SMTP,
- NNTP, ppppmmmmccccdddd, and local HIPPI interfaces using the new
- hhhhiiiipppppppprrrroooobbbbeeee(1) utility.
-
- 3. Changes to the way ppppmmmmccccdddd(1) and ppppmmmmllllooooggggggggeeeerrrr(1) are started
- from ////eeeettttcccc////iiiinnnniiiitttt....dddd////ppppccccpppp.
-
- a. When ppppmmmmllllooooggggggggeeeerrrr is chkconfig'd oooonnnn, ppppmmmmllllooooggggggggeeeerrrr
- instances are launched in the background from
- ////eeeettttcccc////iiiinnnniiiitttt....dddd////ppppccccpppp ssssttttaaaarrrrtttt, as this helps faster
- system reboots. In some cases this results in
- diagnostics from ppppmmmmllllooooggggggggeeeerrrr and/or
- ////uuuussssrrrr////ppppccccpppp////bbbbiiiinnnn////ppppmmmmllllooooggggggggeeeerrrr____cccchhhheeeecccckkkk that previously
- appeared when ////eeeettttcccc////iiiinnnniiiitttt....dddd////ppppccccpppp was run to now be
- generated asynchronously - any such messages are
- forwarded to the rrrrooooooootttt user as e-mail. These
- messages are in addition to those already
-
-
-
-
-
-
-
-
-
-
-
- - 7 -
-
-
-
- written to /_v_a_r/_a_d_m/_p_c_p/_N_O_T_I_C_E_S by ppppmmmmppppoooosssstttt(1)
- from ////eeeettttcccc////iiiinnnniiiitttt....dddd////ppppccccpppp.
-
- b. A new utility, ppppmmmmccccdddd____wwwwaaaaiiiitttt(1), provides a more
- reliable mechanism for detecting that ppppmmmmccccdddd is
- ready to accept client connections.
-
- 4. In concert with changes to ppppmmmmiiiieeee, the ppppmmmmccccdddd PMDA has
- been extended to export information about executing
- ppppmmmmiiiieeee instances and their progress in terms of rule
- evaluations and action execution rates. Refer to the
- ppppmmmmccccdddd....ppppmmmmiiiieeee....**** metrics.
-
- 3.3 _M_o_n_i_t_o_r__C_h_a_n_g_e_s
-
- The major additions and changes for the performance
- visualization and analysis tools are described below.
-
-
- 3.3.1 _C_h_a_n_g_e_s__f_o_r__I_R_I_X__6_._5
-
- The following incidents were resolved for IRIX 6.5.13.
-
- 807561 The semantics for the ----iiii and ----IIII options to
- ppppmmmmpppprrrroooobbbbeeee(1) were changed to allow all instances (not
- just the ones found at the next ppppmmmmFFFFeeeettttcccchhhh(3)) to be
- reported.
-
- 814452 Timestamps have been added to the ppppmmmmiiiieeee(1) output
- when the ----vvvv option is used.
-
- 823023 Added the new ttttooooppppiiiioooo(1) tool to measure process-level
- demand for I/O bandwidth.
-
- The following incidents were resolved for IRIX 6.5.12.
-
- 815387 The list of instances reported by ppppmmmmvvvvaaaallll(1) was not
- being sorted, and this caused some confusion for
- metrics with an underlying instance domain that
- changed over time.
-
- The following incidents were resolved for IRIX 6.5.11.
-
- 803336 Better creation of ppppmmmmaaaaffffmmmm(1) archive folios from the
- ``record'' mode of oooovvvviiiieeeewwww(1) for both SGI Origin 2000
- and Origin 3000 Series systems.
-
- The following incidents were resolved for IRIX 6.5.9.
-
- 790122 Add support for SGI Origin 3000 Series systems in
- oooovvvviiiieeeewwww(1).
-
-
-
-
-
-
-
-
-
-
-
- - 8 -
-
-
-
- The following incidents were resolved for IRIX 6.5.8.
-
- 776214 Better handling of error return codes for _t_e_l_n_e_t
- commands used by the eeeessssppppppppiiiinnnngggg and sssshhhhppppiiiinnnngggg PMDAs.
-
- 781065 The generic ppppmmmmiiiieeee rules supported by ppppmmmmiiiieeeeccccoooonnnnffff have
- been extended to allow alarm notification to be
- passed to EnlightenDSM.
-
-
- 3.3.2 _P_C_P__2_._1__t_o__P_C_P__2_._2
-
- 1. Support for the SGI Origin 3000 Series servers has
- been added with new visualisation features specific
- for these servers, and a complete re-write of the
- oooovvvviiiieeeewwww(1) monitoring application.
-
- 3.3.3 _P_C_P__2_._0__t_o__P_C_P__2_._1
-
- 1. ppppmmmmiiiieeee
-
- a. A syntactic restriction in the specification
- language has been relaxed, and actions may now
- have an arbitrary number of quoted arguments
- (previously at most two arguments were allowed).
- At the same time a problem with the ssssyyyysssslllloooogggg
- action was resolved, allowing the ----pppp option to
- be passed to llllooooggggggggeeeerrrr(1). For example, this is
- now valid:
- some_inst (
- (100 * filesys.used / filesys.capacity) > 98 )
- -> syslog "-p daemon.info 'file system close to full"
- " %h:[%i] %v% " "'";
-
- b. Metrics with dynamic instance domains are now
- correctly handled by ppppmmmmiiiieeee. Previously ppppmmmmiiiieeee
- instantiated the instance domain when it
- started, and was oblivious to any subsequent
- changes in the instance domain. This is most
- useful for rules using the metrics of the
- hhhhoooottttpppprrrroooocccc PMDA that is part of the ppppccccpppp product.
-
- c. The ppppmmmmiiiieeee language has been extended to allow two
- new operators mmmmaaaattttcccchhhh____iiiinnnnsssstttt and nnnnoooommmmaaaattttcccchhhh____iiiinnnnsssstttt that
- take a regular expression and a boolean
- expression. The result is the boolean AND of
- the expression and the result of matching (or
- not matching) the associated instance name
- against the regular expression.
-
-
-
-
-
-
-
-
-
-
-
-
-
- - 9 -
-
-
-
- For example, this rule evaluates error rates on
- various 10BaseT Ethernet network interfaces
- (e.g. ecN, etN or efN):
- some_inst
- match_inst "^(ec|et|ef)"
- network.interface.total.errors > 10 count/sec
- -> syslog "Ethernet errors:" " %i";
- The following rule evaluates available free
- space for all filesystems except the root
- filesystem:
- some_inst
- nomatch_inst "/dev/root"
- filesys.free < 10 Mbytes
- -> print "Low filesystem free (Mb):" " [%i]:%v";
-
- d. During rule evaluation, ppppmmmmiiiieeee keeps track of the
- expected number of rule evaluations, number of
- rules actually evaluated, the number of
- predicates that are true and false, the number
- of actions executed, etc. These statistics are
- maintained as binary data structures in the
- mmmmmmmmaaaapppp'ed files /_v_a_r/_t_m_p/_p_m_i_e/<_p_i_d>. If ppppmmmmiiiieeee is
- running on a system with a PCP collector
- deployment, the ppppmmmmccccdddd PMDA exports these metrics
- via the new ppppmmmmccccdddd....ppppmmmmiiiieeee....**** group of metrics.
-
- e. Some restrictions on the expansion of macros
- (e.g. $name) have been removed, so macro
- expansion can occur anywhere in the ppppmmmmiiiieeee rule
- specifications.
-
- f. There has been some changes to improve the
- formatting of numeric values reported with the
- options ----vvvv, ----VVVV and ----WWWW, and for the expansion of
- %%%%vvvv in actions. In general terms these have
- removed extra white space and reduced the
- likelihood of scientific notation being used.
-
- 2. A set of parameterized ppppmmmmiiiieeee rules have been developed
- which are applicable to most systems and will allow
- ppppmmmmiiiieeee to be used by new users without knowledge of the
- ppppmmmmiiiieeee syntax. A new utility, ppppmmmmiiiieeeeccccoooonnnnffff(1) has been
- built which allows these rules to be enabled or
- disabled, or the parameters and thresholds adjusted
- for a specific system.
-
- The combination of ppppmmmmiiiieeee, ppppmmmmiiiieeeeccccoooonnnnffff, ppppmmmmiiiieeee____cccchhhheeeecccckkkk and
- ////eeeettttcccc////iiiinnnniiiitttt....dddd////ppppmmmmiiiieeee provides the infrastructure required
- for PCP to search for behavior indicative of
- performance problems in a fully automated manner with
- little or no local customization required. Where
-
-
-
-
-
-
-
-
-
-
-
- - 10 -
-
-
-
- customization is needed, ppppmmmmiiiieeeeccccoooonnnnffff(1) provides a
- convenient way of doing this.
-
- 3. A new utility, hhhhiiiipppppppprrrroooobbbbeeee(1), has been added which will
- check the status of HIPPI interfaces on a system.
- More sophisticated monitoring of HIPPI interfaces will
- be supported with a hhhhiiiippppppppiiii PMDA that will be released
- as part of the forthcoming PCP for HPC add-on product.
-
- 3.4 _F_e_a_t_u_r_e_s__R_e_m_o_v_e_d__o_r__D_e_p_r_e_c_a_t_e_d
-
- In PCP 2.2, the following features from earlier PCP versions
- have been removed or are deprecated.
-
- None.
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-